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ABSTRACT 

Instructional improvement within the context of 
criterion-referenced and norm- referenced tests is described. Such 
categories overemphasize test interpretation rather than design 
characteristics of achievement tests. Data from most measurement 
situations may be reported or interpreted either according to 
criterion- or norm-referenced standards. Row the test is developed 
and what it represents is of critical importance. The paper proposes 
alternative conceptualizations of test design: construct-referenced, 
objectives- referenced and domain- referenced. Using student data, the 
teacher needs to identify deficiencies in achievement, possible 
expla;inati<ms, and remedies and to put the remedies into operation. An 
analysis of the utility of each test type results in the appraisal 
that domain referenced tests provide the most information for 
teachers and therefore are the most desirable as data sources for 
instructional improvement. However, because of lack of knowledge 
about instruction, poor training in available instructional 
principles, and lack of resources to encourage changes in 
instructional habits, it is concluded that instructioi^al improvement, 
even if measurement considerations were satisfied, is not imminent. 
(Author/DJ) 
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Ui"lvttr«ii.y of Calii'oniin, Lc" x\n}*ej.cs 

In uninventiv*--; fashion, I shall be^in with :» lift of definitions, 
qusliCicefclcnG , caveats and platitudes to place Intel* hci'erxias in conto'.t . 

lUrst, the tern "ins crruotion." Ast?uma that V7t: Vienn the 't;rf'agar.’ent: 
of conditicns ar.d events thrcugli v-hicb leatiiiug i« prosuiriahly 
For this discussion, accept the admittedly lijrJLtcd definitio',. of instruc*- 
tlonnl improvement in te mr. of pupil grovjth on soitva measure, r.vtliev thr-n 
■ a refinement of a prescribed set of Le#it:her behaviors. 

Instruction can be mediated by & teacher, a set of materials, or son .2 
couibination ,*>f the two. Based on a measui'ement, the teacher .altei'i- proceduroG 
in some way to effect better results. Similarly, the designers of mat crisis 
v;ill re~work them, or their support systems, to. produce better pupil per- 
furiiiance on a subsequent measurement. Instructional improvement is usually 
conceived in terms of the particular curriculum goals of the institution, 
for instance, a teacher’s ability to bring about reading gains or the 
effsctlveneas of materials in teaching classification of concepts. Such 
instruction operates in a network of constraints. Happenstance, such os 
whether the children previously had a good arithmetic teacher (that irj, did 
they learn arithmetic well?) plays an important role and limits the extent 
to which a teacher can determine or improve his or her Instructional competence 
in teaching higher mathematical principles. Such curriculum-linked instruction 
is also naturally wedded to the available or approved instructional texts 



t; 

I 

f 

t 



Paper presented at a Symposium, "The • Relative Strengths of Norm-Referenced 
and Criterion— Referenced Achievement Tests," of the Annual Meeting of the 
American Psychological Association, Honolulu, September, 1972. 
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and aids. li‘ a school district provides insufficient numbers or inferior 
texts for students’ use, it is likely that, for many teachers, instructioucil 
improveuiciit is circumscribed. 

vniether fully c d by school 1 tations , instructional 

irsi/rcveinent needs , as a point of departure , a set of measuremanto . Hov.* can 



these measures bo employed to improve the learning in the schools? The 
dlstii^.ction between norm-referenc ed and cri te rion-reference d tests is not 
helpful to ivc, for the terms over-emphasize interpretation of the tests. 

Horc important is the basis for test construction and the instructional 
implications which flow from design, rather than test interpretation. Obviously 
data from most measurement situations may be reported or interpreted by 
comparing the number of items obtained against the number of items available 

S' 

or v/ltli any other arbitrary standardv test data may also be reported by 
comparing a child or group’s achievement level (whatever it was) to pcrfornance 
of other children. The critical factor in instruction is not how the results 
are portrayed, for that Is a subsequent problem, but how they arc cbtalncd 
and what they presumably represent. If norm and criterion referenced tests 
are not appropriate descriptors to differentiate among the design character- 



istics of tests for use in Instructional settings, perhaps other categories 



should be explored. 

Instead of "norm-referenced" tests, I suggest construct-referenced to 
describe adilevcment tests which consist of a wide variety of Item types 
and a relatively well-sampled content range. Such a label is intended to be 
Independent of the manner in which the tests are ultimately Interpreted, but 
~ could probably be applied to many present commercially produced and widely 
used achievement tests. 
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Labeling vhac passes for "criterloh-referenced" tests is more difficult. 
The first alternate title is objective-referenced tests. However, such 
designation is unfortunately misleading, for it does not follow that, if 
one has an objective based on observable behavior, one will produce homo- 
geneous test items which relate to the objective. In fact, since the content 
specifications are often poor, one can depend only on the fact that item 
formats of objectlves-referenced tests will be similar, e.g., all short 
answer { all multiple choice with four options. 

' A substantial refinement over objectlves-referenced tests are domain- 
referenced tests. (See Hi vely, et al., 1968, 1971.) Instead of a "behavioral" 
or performance objective emphasizing, for Instance, that the learner will 
be able to pronounce phonemic combinations, a domain specifies both" the 
performance the learner is expected to demonstrate as well as tiie content 
doniAin to which the performance is to generalize. In the pronunciation example 
either the content of Interest (sh, th) or a generation rule (all ending 
blends) for content is provided. Such tests attempt to clarify what it is 
they are attempting to measure and to provide a fuller basis for revision by 
a potential user. 

To summarize, consider three different t3n>^s of achievement tests for 
instructional Improvement: construct-referenced, objective-referenced and 
domain-referenced. The emphasis in construct-referenced tests is on providing 
a full range of content and behaviors relevant to a construct such as compu- 
tational ability. The emphasis of objectlves-referenced tests has been on 
providing items which exhibit similar response requirements related to an 
often poorly defined content area, e.g., an objective which states the child 
vl.ll be able to write the theme of an essay when the critical properties of 



cssfiya is not explained. T)ie doiDain-rofcrcncsd teste includes itenis vdiicli 
conferro to s particular response requirement, such as •pronunciation , and 
provides a description as vjcll as the class of content to which the 
performance is presumably to generalize, i.c., consonant -vowel-consonant 
words. 

The three test types have political implications as well. Ccnstruct- 
refercnced tests, by their published titles, promise grand things, for 
they iteasure areas like "critical reading," and "scientific coiicepts." 

Children who perform poorly in such neasures are treated v/ith head-shaking 
pity. Objective-referenced tests also contract for more than they deliver. 

A test v;hlch measures the child's ability to derive meaning from paragraphs 
by answering questions, will surely miss a range of paragraph and question 
complexity which critics feel is Important. But because an objective has 
been written, it may appear to a user, such as a school board, that there is 
a great deal of speclflty in the goal and thus soiceone (teachers) should 
know enough to achieve it. Tests v/hich appear precise but are not can 
seriously nlclead teacheirs and administrators. 

Domain-referenced tests have not been developed frequently enough to 
promote predictable responses in tisers. However, since their content is so 
well defined one would expect a fuller congruence with the user's needs and 
the test's purposes. Domain-referenced tests arc so time-consuming to produce 
that only a relatively few will ever be satisfactorily written, and those 
only for critical, consensus objectives. 

If teachers had useful information from tests, so the story goes, instruc- 
tional Improvement would follow. Approaches to instruction, characterized by 
their proponents as "decision-oriented," "competency-based," "rational," or 



"ayatemaLic" ate centered on tlie pr niise that if teachers could be provided 
with valid information on the performance of their students, they would 
be able to adapt their Instruction and successfully remediate. 

Posit a mipi’uum set of events and knowledsje that a teacher needs in ordsr 
to Implement an instructional improvement cycle; 

Step 1. Data on students' abilities to perform skills and behaviors. 

Step 2. Ability to identify deficiencies in students' achievement c 

Step 3. Ability to identify possible explanations for 'dicse dfflcie'ncies. 

' Step Ability to identify alternative remedial sequences. 

Step 5. Ability to Implensent sudi sequences. 

Such an event set requires, at minimum, compliance by students, somethlnfi 
frequently not guaranteed. Compare the three types of tests, construct, 
objectives and domain-referenced in terms of how they might facilitate the 
instructional cycle. All three tests provide useful (Step 1) data. Construct- . 
referenced tests are presently most respectably developed. Uicy are, hov/ever, 
ftdmlnifitered on a schedule not normally consistent with continuing diagnosis, 
and are often reported in terms of the diilds' status with respect to other 
children rather than his or her own particular strengths and weaknesses. Still, 
a teacher cculd get a general idea about learners ' proficiency . Ob jectives- 
referenced tests may be scheduled more regularly and provide data which 
appear to give information about what the dilld can do, but because the 
content analysis In the test design is usually weak , these tests may not 
provide serious assistance In helping a teacher to Identify actual competencies 
Domain-referenced tests represent an improvement in the quality of Information 
they provide, in that the range of instances to which a learner is able to 
perform is explicit. Data from such tests are "enabling;" if teachers v;ould , 
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they could Identify v;ich inerGased explicitness vhat the students 

ftble to deal with. Identification of performance dofic.iencic& (Step 2) , is 

theoretically possible through the use of all three tests. Since arbitrary 

judgments are usually invoked in deciding on what consLitutes a deficiency, 

that is, the 44th percentile is bad, or 68 percent is unsatisfactorj', none 

of the test types seriously advantages the user*. Deficiency, even if there 

were defensible procedures for determining cut-off points to define "deficiency.'' 

we would halt the analysis of the utility of measurement to foster instvuctio)ial 

improvement. 

Even if test producers get very busy and produce a range of exciting, 
important and valid achiewment instruments, many teachers would be unable to 
put the data produced to reasonable use (Steps 3, 4, 5) for the following 
reasons. First, only limited knowledge is available in the instructional field. 
Even agencies with talented instructional designers treat each development 
task, in large measure , as wholly idiosyncratic and employ heuristic test- 
and-revislon cycles in the validation of materials. Well-researched instructional 
principles exist in only limited, and largely operant, clusters. 

Secondly, even where instructional design principles exist, they arc not 
dissemlviated. Although many teachers function V7cll without arcane knmv’ledge 
from instructional research, less gifted teacliers might be able to put such 
knowledge to use, but they have no access to the fount. Teaching training, in 
disarray for years, has not yet provided adequate preparation for many. 
Coordinators of in-service education of teachers rarely have sufficient resources 
to provide training. When well-funded, expertly staffed Instructional development 
agencies spend considerable time designing and redesigning satisfactory instruc- 
tion in one or two areas, why should one expect a single teacher, modestly 
trained, to be able to do as well in many subject matters with few resources? 



ERIC 



6 



- 6 - 






la 



Beyoad t'r^M paucity of instrucuior.al principles and the dearth of 
training is the nature of the Individual teacher’s predispositions. Even 
if; (1) good data \jerc available, (2) reasonable "crltcrioir" levels v ere 
agreed to, (3) instructional principles existed, (4) tea;chers kno<r how 
to use such principles and adapt them to given situations, habitual instruc- 
tional routines will need to be overcome. Teachers v/lll need incentives, 
support, and re\v’ards if they are to change significantly their present 
practices. In fact, since most accountability systems use the threat of 
punishment rather than incentives as a basis for fostering teaching improve- 
ment, one could become even more pessimistic about the likelihood of 
facilitating teacher change. •• 

If analysis leads one to believe that, even with measurement advances. 
Instructional improvement would not Inevitably follow, vjhat Implicationc 
are there for rescarci and development activity in test design? Further, 

In the present accountability surge what can Instructional and measurement 
experts do to help both the teachers and the students? Clearly, construct- 
type tests will continue to be used to give a broad, comparative picture of 
school achievement and they should be. Objectlves-referencod tests may be 
appropriate for individual teachers to use to measure their pupil's progress 
and their own achievement of certain goals of high personal Interest. They 
should probably be locally prepared, since technical quality of the tests 
will necessarily relax, and results should he of Interest on a personal 
classroom-feedback level only. Domain-referenced tests are those tests which 
may be employed for evaluation, as in accountability, where improvement is 
expected. Tlie use of domain-referenced data, gives the teachers most 
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assisttincc , for they are provided with clear inforniation about \vhat 
kind of practice items are in the set of content and performance ncasured 
by the test. One might expect that teachers could be easily prepared to 
provide instructional situations that allov; i: tudonts to practice content 
from the appropriate set without permitting the students to have experience 
with the test items themselves. Doraain-referenced tests are difficult to 
prepare, particularly because not all subject matter is presently al^alyscd 
in a v?ay to perjiiit the preparation of such tests. If experts in American 
government insist that there are in fact three functions of the executive 
branch in the United States, then no amount of analysis by skilled psycho- 
metricians to come to deeper truths is worthwhile. VHiere subject matter 
experts cannot provide appropriate and generalizeble dimensions for the. 
analysis of subject matter, psychometricians should not bear the burden of 
the trivia. It is not their problem. Perhaps, after all, such objectives 
should not be measured in any organized or institutional sense. 1 v;ould 
suggest that relatively few areas be Identified for accountabllity-teachcr 
improvement testing. Basic reading and arithmetic speed into focus here. 

Beyond those two areas, I would suggest domain-referenced or objective based 
measurement be publicly used very sparingly. Other process-type measures 
could tell the taxpayer if the teachers all performing adequately, until 
teachers are trained and willing to use appropriate instructional strategies, 
the quest for valid achievement measurements will remain a challenging problem, 
but one functionally Irrelevant to arena of instructional Improvement. 
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